Serveur d'exploration H2N2

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Clustering the Normalized Compression Distance for Influenza Virus Data

Identifieur interne : 000F68 ( Main/Exploration ); précédent : 000F67; suivant : 000F69

Clustering the Normalized Compression Distance for Influenza Virus Data

Auteurs : Kimihito Ito [Japon] ; Thomas Zeugmann [Japon] ; Yu Zhu [Japon]

Source :

RBID : ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08

Abstract

Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.

Url:
DOI: 10.1007/978-3-642-12476-1_9


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</author>
<author>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
</author>
<author>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-12476-1_9</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-2MQPN5CV-Q/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000E18</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">000E18</idno>
<idno type="wicri:Area/Istex/Curation">000E18</idno>
<idno type="wicri:Area/Istex/Checkpoint">000235</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000235</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Ito K:clustering:the:normalized</idno>
<idno type="wicri:Area/Main/Merge">000F76</idno>
<idno type="wicri:Area/Main/Curation">000F68</idno>
<idno type="wicri:Area/Main/Exploration">000F68</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Clustering the Normalized Compression Distance for Influenza Virus Data</title>
<author>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Research Center for Zoonosis Control, Hokkaido University, N-20, W-10 Kita-ku, 001-0020, Sapporo</wicri:regionArea>
<wicri:noRegion>Sapporo</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Division of Computer Science, Hokkaido University, N-14, W-9, Sapporo, 060-0814</wicri:regionArea>
<wicri:noRegion>060-0814</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Japon</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
</noRegion>
<name sortKey="Ito, Kimihito" sort="Ito, Kimihito" uniqKey="Ito K" first="Kimihito" last="Ito">Kimihito Ito</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zeugmann, Thomas" sort="Zeugmann, Thomas" uniqKey="Zeugmann T" first="Thomas" last="Zeugmann">Thomas Zeugmann</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
<name sortKey="Zhu, Yu" sort="Zhu, Yu" uniqKey="Zhu Y" first="Yu" last="Zhu">Yu Zhu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/H2N2V1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000F68 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000F68 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    H2N2V1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:F78AFAB7DE19FE4803D32A2D6BFCCB44795CFF08
   |texte=   Clustering the Normalized Compression Distance for Influenza Virus Data
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Tue Apr 14 19:59:40 2020. Site generation: Thu Mar 25 15:38:26 2021